Data Set Details

This Dataset is retrieved from Kaggle, and the following data is collected by scrapping from various web resources which is advertised about used Cars Market of Belarus (western Europe). The data set includes a lot of useful car information such as, Manufacturer, Model, Odometer, Produced year,Transmission etc. Up to the 2nd of December 2019, The dataset contains 38,450 samples of the Used car market in Belarus. data set link

Goal of Analysis

Nowadays, the Used Car Market has been so well developed, and the highly cost-effective used car makes many people will put the used cars as the first choice when they are planning to have a car. As the obeservation we found that People would like to spend a lot of time for discussing prices of heavily used cars, how they age and hold or lose value. And if someone is planning to buy a used car, they have to spend some time finding similar cars in the catalog and trying to discover trends and figure out the fair price. More importantly, they have to withstand the stress of decision making regarding the price. Thus, the concept of this research is to collect the data and try to find out the relationship between different attributes of the car based on its parameters (both numerical and categorical), while using the most effective tools available to explore thoroughly.

Distribution of Data

Central Limit Theorem

The central limit theorem states that the distribution of the means of samples approximately become normal distributed as the sample size becomes larger. In this section we use the cleaned odometer values to show the CLT. Below is the graphical summarize of the distribution of 1000 random samples of various sample size (10,20,30,40).

##        vars     n     mean       sd   median  trimmed      mad      min
## size10    1 10000 240704.8 36873.75 240411.4 240575.1 36853.06  82361.0
## size20    2 10000 241095.9 26368.40 241366.0 241123.6 26508.11 147721.9
## size30    3 10000 240761.0 21379.00 241235.0 240883.7 21380.30 157766.0
## size40    4 10000 240602.1 18428.93 240586.3 240582.8 18638.69 154482.6
##             max    range  skew kurtosis     se
## size10 397370.7 315009.7  0.03    -0.03 368.74
## size20 342238.3 194516.5 -0.02    -0.04 263.68
## size30 328710.4 170944.4 -0.06     0.10 213.79
## size40 321391.2 166908.5  0.02     0.00 184.29
  • As we seen from the graph, the sample size of 40 has the least variation.

Sampling

Sampling is a way to examine the dataset by selecting a portion of it as representative. By performing sampling, processed data will be slicing into different chunks and then using different methods to select samples from the chunks. There are several ways to perform the slicing and the selecting. To start sampling, a population size is required, and in our project, a size of 1000 samples is set to be the population. We choose to use an odometer as our sample population and apply sampling on it. The first way we are using sampling is simple random sampling without replacement(SRSWOR). It’s a way to perform sampling by selecting samples from a larger group, and each object in the frame will have the same chance to get picked as a sample. Systematic sampling is the second method we used on sampling. In systematic sampling, samples are selected in a set interval calculated by dividing the total population by the targeting sample size. In this method, samples will be selected constantly after each interval, except the first sample will be selected from the first set of numbers within the first interval. The last method of sampling we are using is stratified sampling. Stratified sampling is to select samples from different strata, which is created by subgrouping the groups with similar characteristics using SRS. In our project, we use the attribute transmission as the stratify indicator, and stratified odometer by proportion of engine fuel types.

## Stratum 1 
## 
## Population total and number of selected units: 12834 338.0304 
## Stratum 2 
## 
## Population total and number of selected units: 25133 661.9696 
## Number of strata  2 
## Total number of selected units 1000

General Analysis of Used Cars

Into the research topic,we will explore the attributes, and gain further insight of the most popular selling used car brand in the Belarus used car market.

Top 10 selling Car Brand

  • We first try to find out top 10 selling used car brand in the Catalog

The top 10 brand cars is almost have two-third of the share in the catalog. The most popular vehicle manufacturer is Volkswagen, which accounts for 11% of the whole used car market and is the only automobile manufacturer with a proportion of more than 10%

Veichle’s body color overview

  • It is also interesting to exaime the color of the used car in the catalogs.

The most popular vehicle body color is Black, which accounts for 20% of the total vehicles, the second is Silver, accounting for 17.8%. Statistics show that the least color is orange, only 0.478%. But, since the many special colors cannot be included in the calculation, we classify them as ‘others’. Therefore,the least color definitely not be Orange in the used car market;

Veichle’s body type overview

  • Next, we try to find out which body type is most popular in the catalog.

The most popular vehicle body type is Sedan1, which accounts for 33.8% of the total vehicles. It accounts for one third of the used car market and almost twice as much as the second type.

Investigation of Vehicle features over the years

Observations

  • Since 1984, Automatic transmission vehicles have appeared in the used car market, and with the development of time,Automatic transmission vehicles have gradually overtaken Mechanical transmission vehicles. In 2006, the number of Automatic transmission vehicles surpassed that of Mechanical transmission vehicles for the first time in the used car market, and has been maintained since then.

  • In the used car market, gasoline vehicles and diesel vehicles are the most, but not only these two models. Gas vehicles always had a certain market volume, and hybrid-petrol vehicles had joined in 2005.

  • Warranty won’t be an important factor for car buyers, because lots of used cars with higher production years and high mileage. Since it’s too common in the used car market and almost most vehicles have no warranty. Thus, the Warranty becomes an negligible reference value.

Conclusion

From all the comparisons we’ve made before, the most popular selling vehicle among the most popular brands is the Passat, which price between 1900-7200 dollars, Odometer around 310,000 mileage,Produced in 2006.


  1. A sedan is a 4-door passenger car with a separate trunk built on a three-box body↩︎